The source coding game with a cheating switcher 



Hari Palaiyanur Cheng Chang Anant Sahai 

hpalaiya@eecs.berkeley.edu cchang@eecs.berkeley.edu sahai@eecs.berkeley.edu 

Department of Electrical Engineering and Computer Sciences 
University of California at Berkeley 
Berkeley, CA 94720 



o : 
o . 

(N ■ 

5-h : 

<: 
o\ : 



CZ2 

O 



(N 
> 

(N 
O 

o 
l> 
o 

o 



X 



Abstract — Berger's paper 'The Source Coding Game', IEEE 
Trans. Inform. Theory, 1971, considers the problem of finding the 
rate-distortion function for an adversarial source comprised of 
multiple known IID sources. The adversary, called the 'switcher', 
was allowed only causal access to the source realizations and the 
rate-distortion function was obtained through the use of a type 
covering lemma. In this paper, the rate-distortion function of 
the adversarial source is described, under the assumption that 
the switcher has non-causal access to all source realizations. The 
proof utilizes the type covering lemma and simple conditional, 
random 'switching' rules. The rate-distortion function is once 
again the maximization of the R(D) function for a region of 
attainable IID distributions. 

I. Introduction 

The rate distortion function, R(D), specifies the number 
of codewords, on an exponential scale, needed to represent a 
source to within a distortion D. Shannon [1] showed that for an 
additive distortion function d and a known discrete source that 
produces independent and identically distributed (IID) letters 
according to a distribution p, 



R{D) = R P (D) 



min I(p,W) (1) 

p(x)W(y \x)d{x ,y)<~D 



where I(p, W) is the mutual information for an input distri- 
bution p and probability transition matrix W. 

Sakrison [2] studied the rate distortion function for the class 
of compound sources. That is, the source is assumed to come 
from a known set of distributions and is fixed for all time. If G 
is the set of possible sources, Sakrison showed that planning 
for the worst case source is both necessary and sufficient in 
the discrete memoryless source case. Hence, for compound 
sources, 

R(D) = maxi? p (D) (2) 

In Berger's 'source coding game' [3], the source is assumed 
to be an adversarial player called the 'switcher' in a statistical 
game. In this setup, the switcher is allowed to choose any 
source from G at any time, but must do so in a causal manner 
without access to the current step's source realizations. The 
conclusion of [3] is that under this scenario, 



R(D) = max R p (D) 



(3) 



where G is the convex hull of G. In his conclusion, Berger 
poses the question of what happens to the rate-distortion 



function when the rules of the game are tilted in favor of 
the switcher. Suppose that the switcher were given access to 
the source realizations before having to choose the switch 
positions. The main result of this paper is that under these 
rules, 

R(D) = max R P (D) (4) 

pec 




E 



V V such that 

V C X 



(5) 



Here, the pi are the distributions of the m sources and V is 
the set of all probability distributions on X. 

Section [TT] sets up the notation for the paper, and is followed 
by a description of the source coding game in Section [III] The 
main result is stated in Section HVl and an example illustrating 
the main ideas is given in Section [V] The proofs are located in 
Section [VI] and some concluding remarks are made in Section 

rvrii 

II. Definitions 

We work in essentially the same setup as Berger's source 
coding game [3], and with most of the same notation. There are 
two finite alphabets X and y. Without loss of generality, X = 
{1, 2, . . . \X\} is the source alphabet and y = {1,2, . . . \y\} 
is the reproduction alphabet. Let x = (xi, . . . , x n ) denote an 
arbitrary vector from X n and y = (yi, . . . , y n ) an arbitrary 
vector from y n . When needed, x = [x\, . . . , Xk) will be 
used to denote the first k symbols in the vector x. 

Let d : X x y — > [0, oo) be a distortion measur^ (any 
nonnegative function) on the product set X x y. Then define 
d n : X n x y n -> [0, oo) for n > 1 to be 



1 ™ 



Vk) 



(6) 



k=l 



Let V be the set of probability distributions on X, V n the 
set of types of length n strings from X, and let W be the 
set of probability transition matrices from X to y . The rate 

1 We could allow for infinite distortions and require that the probability that 
the distortion exceed D + e go to zero for all e > 0. The main result would 
hold in this setup as well. 



distortion function of p E V with respect to distortion measure 
d is defined to be 



Rn(D) = min I(p, w) 

F weW{ P ,D) 



(7) 



where 



1*1 \y\ 

W(p, D) = \weW:Y / J2 P(i)w(j\i)d(i,j) <D) ( 8 ) 

i=l 3=1 



and I(p, w) is the mutual informatior@ 

1*1 \y\ 

Hp> w ) = ^2^2p( i ) w (j\ i ) i °s2 

<=1 3=1 



(9) 



The only interesting domain of values for R P (D) is D 6 
(-Dmin(p),-Dmax(p)) where 



-D m in(p) 



D maK (p) 



1*1 

E 

1=1 



mind(i, j) (10) 

3 

1*1 

min V"p(i)d(i, j) (11) 

1 * * 



Let P = {yi, . . . , y^} be a codebook of length n vectors 
in y n . Define 



d n (x;B) = mmd„(x,y) 
yes 



(12) 



If B is used to represent an IID source with distribution p, 
then the average distortion of B is defined to be 

d(B) = ^(x)rfn(x; B) = E[d n (Z; B)] (13) 



where 



(14) 



k=l 



Let K(n, D) be the minimum number of codewords needed 
in a codebook B C y™ so that <i(P) < D. Then, Shannon's 
Rate-Distortion Theorem ([1], [4]) says that if the source is 
IID with distribution p, 



lim - log 2 K(n, D) = RJD) 

n — >oo ji 

III. The Source Coding Game 



(15) 



We suppose as in Berger's paper that a 'switcher' is a player 
in a two person game with access to the position of a switch 
which can be in one of rn positions. The switch position 1,1 < 
I < m corresponds to a memoryless source with distribution 
pi(-) that is independent of all the other source^. Let s = 
(si,S2, • • • j Sri) be the vector of switch positions chosen by 
the switcher. Let x k be the switcher's output at time k and let 

2 We use log 2 in the report, but any base can be used. 

3 There can be multiple copies of the same source. For example, there can 
be any number of copies of a Bernoulli (1/10) source, so long as they are 
all independent. In that sense, the switcher has access to a list of m sources, 
rather than a set of m different distributions. 




Fig. 1. The source coding game. 

xi t k be the output of the I th source at time k. When needed, 
x; will denote the block of n symbols for the I th source. 

The other person in the game is called the 'coder'. The 
coder's goal is to construct a codebook of minimal size to 
ensure the average distortion between the switcher's output 
and reconstruction in the codebook is at most D. Fix n and 
D > 0. Let B denote the codebook chosen by the coder, 
and d n (x; B) be the distortion between a vector x and the 
best reproduction of x in B; in the sense of least distortion. 
The payoff of the game is the average distortion, which for a 
particular switching strategy is 



E[d(i;B)] = V P S (x)d„(x;B) 



(16) 



Here Ps(x) is the probability of the switcher outputting 
the sequence x averaged over any randomness the switcher 
chooses to use, as well as the randomness in the sources. Let 
P(s, x) be the probability of the switcher using a switching 
vector s and outputting a string x. Then, 



(17) 



sG{l, 



In Berger's original game, the coder chooses a codebook 
that is revealed to the switcher. The switcher must then choose 
the switch position at every integer time k without access 
to the actual letters that the sources produce at that time. 
The switcher, however, has access to the previous outputs of 
the switch. So in [3], an admissible joint probability rule for 
P(s, x) is of the form 



P(s,x) = I]P( Sfc |s* 



-\x fc " 



(18) 



fc=i 



In this discussion, we consider the case when the switcher 
gets to see the outputs of the m sources and then has to output 
a letter from one of the letters that the sources produced. 
The switcher outputs a letter, x k , which must come from the 
(possibly proper) subset of X, {xi y k, ■ ■ ■ , £m,fc}- Hence, for 
this 'cheating' switcher, allowable strategies are of the form 



P(s,x|xi, . . . ,x m ) = 

P(s|xi, . . . ,x m )l(xk 



, k ,l<k<n) (19) 



Since the sources are still IID, 

m n 

P(xi, . . . ,x m ) = J ; J^p/(xi, fc ) 



(20) 



1=1 k=l 



Define the minimum number of codewords needed by the 
coder to guarantee average distortion D as M(n, D). 

Bcf, E[d(x;B)} < D 
\B\ : for all allowable 

switcher strategies 

'(21) 

We are interested in the exponential rate of growth of 
M(n, D) with n. Define 



M(n, D) 



R(D) = lim - log 2 M(n, D) 

n— *oo Tl 



(22) 



Let G = {pi(-), ■ ■ ■ ,Pm(')} be the set of m distributions 
on X the switcher has access to. Let G be the convex hull of 
G. Then let 

R*(D) = max R p {D) 
peG 

The conclusion of [3] is that R(D) = R*(D) when the 
switcher is not allowed to witness the source realizations until 
committing to a switch position. 

IV. Main Result 

The main result is the determination of R(D) in the case 
when the switcher gets to see the entire block of mn source 
outputs ahead of choosing the switching sequence. 

Theorem 1: Let the switcher 'cheat' and have access to the 
n outputs of all m sources before choosing a symbol for each 
time k. Then, 



R{D) = R{D) = max R P (D) 



(23) 



where C is defined in (0. 



Here, we have defined R(D) = max p6 c Rp{D). The 
theorem's conclusion is that when the switcher is allowed to 
'cheat', R(D) — R(D). The number of constraints in the set 
C is exponential in the size of X. Depending on the source 
distributions, a large number of these constraints could be 
inactive. Unfortunately, R p (D) is generally not concave in p 
for a fixed D, so computation of R(D) may not be easy. 

Qualitatively, allowing the switcher to 'cheat' gives access 
to distributions p G C which may not be G. Quantitatively, the 
conditions placed on the distributions in C are precisely those 
that restrict the switcher from producing symbols that do not 
occur often enough on average. For example, let V = {1}. 
Then for every p G C, 



p(l)>l[pi(l) 



i=i 



Since the sources are independent, Ill^iPiU) i s tne prob- 
ability that all m sources produce the letter 1 at a given 
time. In this case, the switcher has no option but to output 
the letter 1, hence any distribution the switcher mimics must 




0) 



Fig. 2. The binary distributions the switcher can mimic. G is the set 
of distributions the switcher can mimic without cheating, and C is the set 
attainable with cheating. 



have p(l) > niHiPi(l)- The same logic can be applied to all 
subsets V of X. 

As commented in Section V of [3], R{D) = R*(D) if 
R*(D) — maxpgp R p (D). Before giving the proof of the 
result, an example is presented. 

V. An Example 

Suppose the switcher has access to two IID binary sources. 
Source 1 outputs 1 with probability 1/3 and source 2 outputs 
1 with probability 1/4. Then, since the sources are IID across 
time and independent of each other, for any time k, 



Similarly, 



P(xi,k = %2,k = 0) 



P(xi,k = x 2 m = 1) = 



2 
3 

1 

3 ' 



3 
' 4 

1 

4 = 



1 

2 

1 

12 



Hence, 



P({xi, k , x 2 , k } = {0, 1}) = 1 - i - i = A 



(24) 



(25) 



(26) 



If at time k, the switcher has the option of choosing either 
or 1, suppose the switcher chooses 1 with probability f\. This 
strategy is memoryless, but it is an allowable strategy for the 
'cheating' switcher. The coder then sees an IID binary source 
with a probability of a 1 occurring being equal to: 



P(l) 



1 

12 



12 



(27) 



By using f\ as a parameter, the switcher can produce l's with a 
probability between 1/12 and 1/2. The attainable distributions 
are shown in Figure |2] This kind of memoryless, 'conditional' 
switching strategy will be used for half of the proof of the main 
result. If the distortion measure is Hamming distortion, clearly 
the switcher will choose fi =1 and produce a Bernoulli 1/2 
process. Regardless of the distortion measure, C contains all 
the distributions on X that the switcher can mimic. 



VI. Proofs 
A. Achievability for the coder 

First, the main tool of this section is stated. 

Lemma 1 (Type Covering [3]): Let V n denote the set of 
types for length n sequences from X. Let So(y) = {x € 
X n : rf„(x, y) < D} be the set of X n strings that are within 
distortion D of a y n string y. Fix ap£ V n and an e > 0. 
Then there exists a codebook B = {yi,y2, • • • ,Ym} where 
M < cxp 2 (n(R p (D) + e)) and 

M 



T; C (J S D {y k ) 



k=l 

where T™ is the set of X n strings with type p for n large 
enough. 

We now show how the coder can get arbitrarily close to 
R{D) for large enough n. For <5 > 0, define C5 as 

f EievPW >n^iE 4 ev^(*)- 5 

Cs = ^ p£? : V V such that 

1 vex 

Lemma 2 ( Converse for switcher): Let e > 0. For all n 
sufficiently large 

- log 2 M(n, D) < R(D) + e 

Proof: We know R P (D) is a continuous function of 
p ([5]). It follows then that because Cs is monotonically 
decreasing (as a set) with S that for all e > 0, there is a 
5 > so that 

max R P (D) < max R P (D) + e/2 

pec s pec 

We will have the coder use a codebook such that all X n 
strings with types in Cs are covered within distortion D. The 
coder can do this for large n with at most M codewords where 

M < (n+ 1)1*1 exp 2 (n max R p (D)) (28) 

peCs 

< (n+ l) 1 * 1 exp 2 (n(maxi? p (D) + e)) (29) 
pec 

Explicitly, this is done by taking a union of the codebooks 
provided by the type covering lemma and noting that the 
number of types is less than (n + 1)1*1. Next, we will show 
that the probability of the switcher being able to produce a 
string with a type not in Cs goes to exponentially with n. 

Consider a type p e V n n [V - Cs). By definition, there is 
some VCX such that E tG vP(0 < TlTLi E ie v MO - s - 
Let afc(V) be the indicator function 



a fc (V)=JJl(xi, fc GV) 



afe indicates the event that the switcher cannot output a symbol 
outside of V at time k. Then afc(V) is a Bernoulli random 
variable with a probability of being 1 equal to Q(V) = 
rifci 2~2ievP l (^- ^ nat we can env i s i° n a k(V) as being 



a sequence of IID binary random variables with distribution 
q'±(l-Q(V),Q(V)). 

Now for our type p e V n (1 (V — Cs), we have that for all 
strings x in the type class T p , i ET=i ^-( x i e V) < Q(V) — S. 
Let p' be the binary distribution (1 - Q(V) + 5, Q(V) - S), 
assuming 5 is small enough to make this a distribution (if 
not, make delta small enough). Therefore \\p' — q'\\\ = 25, 
and hence D(p'\\q') > 6/ In 2 by Pinsker's inequality. Using 
standard types properties [6] gives 

P (-I>k(V) <Q( V )- S ) < cxp 2 (-nD(p'\\q')) 
\ H fc=i / 

< cxp 2 (— n<5/ln2) 

If we let E be the event that x has a type which is not in 
Cs, we just sum over types not in Cs to get 

P(E) < ^ exp 2 (-n<5/ln2) 

pev n n(T-c 5 ) 
< (n + 1)1*1 cxp 2 (-n(5/ln2) 

ln(n + 1) 



= cxp 2 



In 2 



- \X 



n 



Now let d* = max Ii9 d(x, y) < 00. Then, regardless of the 
switcher strategy, 



E[d(x;B)} < D + d* -exp 2 



In 2 



-\X 



ln(n+ l) s 



So for large n we can get arbitrarily close to distortion D 
while the rate is at most R(D) + e. Using the fact that the 
rate-distortion function is continuous in D gives us that the 
coder can achieve at most distortion D on average while the 
rate is at most R(D) +e. Since e is arbitrary, R(D) < R(D). 

■ 

B. Achievability for the switcher 

This section considers why R(D) > R(D). We will show 
that the switcher can target any distribution p G C and produce 
a sequence of IID symbols with distribution p. In particular, the 
switcher can target the distribution that yields max pe c R P (D) 
and Shannon's rate distortion theorem gives R(D) > R(D). 

The switcher will use a memoryless randomized strategy. 
Let VQX and suppose that at some time k the set of symbols 
available to choose from for the switcher is exactly V. That is 
{x\ t k, x m .k} = V. Define (3(V) = P({x 1A , x mA } = 
V) to be the probability that at any time the switcher can 
choose any element of V and no other symbols. Then let 
f(i\V) be a probability distribution on X with support V, 
i.e. f(i\V) > 0, V i e X, f(i\V) = if i <£ V, and 
2~2iev = !■ The switcher will have such a randomized 

rule for every nonempty subset V of X such that |V| < m. 
Let V be the set of distributions on X that can be achieved 
with these kinds of rules, so 

P(-) = £vc*,|v|< m 0(V)/(-|V), 
V={ peP : VV s.t. VCX, |V| < m, 
f(-\V) is a PMF on V 



It is clear from the construction of T> that T> C C because 
the conditions in C are those that prevent the switcher only 
from producing symbols that do not occur enough, but put no 
further restrictions on the switcher. So we need only show that 
CCD. The following gives such a proof by contradiction. 

Lemma 3 (Achievability for switcher): The set relation C C 

V is true. 

Proof: Suppose p e C but p V. It is clear that V is a 
convex set. Let us view the probability simplex in M) x \. Since 

V is a convex set, there is a hyperplane through p that does not 
intersect V. Hence, there is a vector (a\, . . . , a\x\) sucri that 

Sl=i a iP(i) = t for some real t but t < min ?e c X^!=i 
Without loss of generality, assume ai > a 2 > ■ ■ ■ > a\x\ 
(otherwise permute symbols). Now, we will construct /(-IV) 
so that the resulting q has J2 i= i a iP(i) > J2\Ji a i<zW> which 
contradicts the initial assumption. Let 



f(i\V) ± 







if i = max(V) 



else 



For example, if V = {1,5,6,9}, then /(9|V) = 1 and 
/ (i|V) = if i 9. Call q the distribution on X induced by 
this choice of /(-|V). Recall that Q(V) - 111=1 E ieV «(*)■ 
Then, we have 

1*1 

J2a iq (i) = ai Q({l}) + a 2 [Q({l,2}-Q({l})} + 

••• +a lxl [Q({l,...,\X\})-Q({l,...,\X\-l})] 

By the constraints in the definition of C, we have the 
following inequalities for p: 

P(l) > Q({1}) = «(1) 
P(l)+P(2) > 0({l,2}) = g(l) + «(2) 



2 P(i) > Q({l,-..,|Af|-l})= ^ g(i) 
1=1 1=1 

Therefore, the difference of the objective is 

1*1 



£o<(p(*) -«(*)) = 
1*1 



i=l 



a l*l 



+ 



{a\x\-i - a\x\) 
h (ai - a 2 ) 



1*1-1 



XI -«(*) 



P(l)-9(1) 



+ 



l*l-i 



X( ai -a i+ i) ^p(i)-^g(i) 
1=1 L j=l j=l 







The last step is true because of the monotonicity in the ai 
and the inequalities we derived earlier. Therefore, we see that 

Yli=i a iP(i) > Si=i a i?(*) f° r ^ e V we na d chosen at the 
beginning of the proof. This contradicts the assumption that 

Si=i a iP{i) < min ge p J2 i= [ a il{i)^ therefore it must be that 
CCD. ' ■ 

VII. Conclusion 

The rate-distortion function for the 'cheating' switcher has 
been described. It is the maximization of the IID rate-distortion 
function over the distributions the switcher can simulate. It was 
assumed the switcher had access to all source outputs ahead of 
time, but the proof required only that the switcher had access 
to the source realizations for one step ahead at each time. 

In this paper, the sources were independent and memoryless. 
A minor tweak to the argument also gets the rate-distortion 
function if the sources are dependent but still memoryless. 
The region C would just be modified to become: 

C = < peP : VV such that 

{ vex 

A more interesting problem is to consider what happens 
when the sources are independent but have memory. Appar- 
ently, Dobrushin [7] has analyzed the case of the non-cheating 
switcher with independent sources with memory. One could 
imagine that, perhaps, giving the switcher access to all source 
realizations could result in the ability to simulate memoryless 
sources from a collection of sources with memory. 

Similar techniques might also prove useful in considering a 
cheating 'jammer' for an arbitrarily varying channel. While the 
problem is mathematically well defined, it seems unphysical 
in the usual context of jamming or channel noise. The idea 
may make more sense in the context of watermarking, where 
the adversary can try many different attacks on different letters 
of the input before deciding to choose one for each. 
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